Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure the DB file is seekable, use a StringIO wrapper if it's not #38

Closed

Conversation

ikaronen-relex
Copy link
Contributor

This (hopefully) fixes #37

What has been tested:

  • Still works when not bundled in a .jar, does not use StringIO wrapper.
  • Works and uses StringIO wrapper when patched so that the is_seekable check always fails.

What has not (yet) been tested:

  • Does it actually work with a uri:classloader resource in a bundled .jar, with seek failing silently?
  • Does it work with a non-seekable DB file in other cases, where seek might raise an exception?

While I'm 99% confident that the answer to both questions is yes, I'd still like to do a bit more testing. But I wanted to get this PR out for review before the weekend.

(Also, I'm not sure if there are any potential newline and/or UTF-8 conversion issues on some platforms that might require opening the DB files explicitly in binary mode. If there were, I suspect the existing code would already be buggy on such platforms, since it's already seeking around in a file opened in text mode, but I'm still slightly worried and would like to see this tested better.)

@SamSaffron
Copy link
Member

Looks like an ok approach. @jeremy how do you feel about this? adds two tell calls and one seek, seems reasonable in the big scheme.

Can we get away with a slightly less paranoid implementation? (seek 1 (if that fails) not seekable, if tell gives me anything that is not 1 then it is not seekable)

Tests are failing so something is off.

@headius FYI

@ikaronen-relex
Copy link
Contributor Author

Can we get away with a slightly less paranoid implementation? (seek 1 (if that fails) not seekable, if tell gives me anything that is not 1 then it is not seekable)

That was actually more or less what I started with, but I figured one extra tell wouldn't hurt. AIUI tell is supposed to be a fast operation anyway. And since we're reading the first line anyway, I could actually test a real backwards seek after read for basically free.

(Now, if I was actually paranoid, I'd readline again after the seek and check that I get the same line back as first time…)

Tests are failing so something is off.

😬 Not much to go on in the test logs, but let me do some local testing and try to see what's going wrong.

@ikaronen-relex
Copy link
Contributor Author

In local testing, I'm getting a failure in test_full_parity_with_mime_types for the extension .webmanifest, which should return "application/manifest+json" according to MIME::Types. However, rake rebuild_db fixes the failure. Should I just include the rebuilt DB in the PR?

@Fryguy
Copy link
Contributor

Fryguy commented Aug 23, 2021

.webmanifest was merged in #36, so you probably just need a rebase.

@jeremy
Copy link
Collaborator

jeremy commented Aug 24, 2021 via email

@CvX CvX requested a review from SamSaffron January 6, 2022 12:00
@ikaronen-relex
Copy link
Contributor Author

ikaronen-relex commented Sep 30, 2022

It's been over a year. Could someone please review this PR? *nudge, nudge* We'd like to stop using a private fork of this gem but we still need this fix merged in order to do so. 🙄

assert_equal "application/vnd.groove-tool-message", MiniMime.lookup_by_filename("a.GTM").content_type
assert_equal "application/zip", MiniMime.lookup_by_extension("ZiP").content_type
assert_equal "application/vnd.groove-tool-message", MiniMime.lookup_by_filename("a.GTM")&.content_type
assert_equal "application/zip", MiniMime.lookup_by_extension("ZiP")&.content_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm I don't understand these changes ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure that if the lookup fails and returns nil we get a clean failure from the assert_equals instead of a NoMethodError from calling .content_type on nil. I think I ran into that while testing my code.

@@ -102,6 +102,16 @@ def initialize(path, sort_order)
@file_length = File.size(@path)
@rows = @file_length / @row_length

# Make sure the DB file is seekable, use a StringIO wrapper if it's not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole dance of checking makes me uneasy... I would be comfortable with a "reaction" to an error JRuby raises when you do a seek but it gives us nothing. I prefer not to own this heuristic.

Perhaps a simple:

MiniMime.preload_dbs = true

Then

if MiniMime.preload_dbs
   @file = StringIO.new(File.read(@path))
end

With some readme data. Very uneasy with the hack of ... seek around file and check that the computer did what you asked it... as a general solution. @headius - it should be exceptioning out if you tell it to seek and it does not seek, this is hiding bugs in JRuby and making them "normal".

Copy link
Contributor Author

@ikaronen-relex ikaronen-relex Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be exceptioning out if you tell it to seek and it does not seek

I completely agree. However, if we want this code to be compatible with current and past JRuby behavior, we'd still need this "hack". At least the way I wrote it, it should continue working even if JRuby behavior is later changed to raise a reasonable exception (i.e. any IOError or SystemCallError) in this situation.

I suppose a comment explaining why this "dance" is needed might be worth including.

Copy link

@headius headius Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not do anything special for seek. It will either use JVM functions to do this (SeekableChannel.position) or a native lseek call.

It does appear to error for an IO.pipe, for example, but I'm unsure if it is the right error:

$ jruby -e 'IO.pipe[0].seek(0)'
Errno::EPIPE: Broken pipe - No message available
    seek at org/jruby/RubyIO.java:1817
  <main> at -e:1

Also for stdin, with a similar error:

$ jruby -e '$stdin.seek(0)'
Errno::ESPIPE: Illegal seek - <STDIN>
    seek at org/jruby/RubyIO.java:1817
  <main> at -e:1

I am a bit confused which resources are not seekable but not raising an error. Just classloader resources? Some other file type for the DB? I think we can extend this error raising to those other resources types if someone can clarify which ones are failing to raise an error here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a JRuby patch that will raise EPIPE for all streams that do not appear to be seekable, including classloader resources:

[] ~/work/jruby93 $ rvm ruby-3.1 do ruby -e 'IO.pipe[0].seek(0)'
-e:1:in `seek': Illegal seek (Errno::ESPIPE)
	from -e:1:in `<main>'

[] ~/work/jruby93 $ rvm ruby-3.1 do ruby -e '$stdin.seek(0)'
-e:1:in `seek': Illegal seek @ rb_io_seek - <STDIN> (Errno::ESPIPE)
	from -e:1:in `<main>'

[] ~/work/jruby93 $ jruby -e 'p File.open("uri:classloader:jruby/kernel/kernel.rb").seek(1)'
Errno::EPIPE: Broken pipe - uri:classloader:jruby/kernel/kernel.rb
    seek at org/jruby/RubyIO.java:1817
  <main> at -e:1

This is a pretty aggressive patch, but might be necessary to make this work properly. By the time we do this seek call, all we have in hand is a java.nio.Channel, and we use the presence of the SeekableChannel supertype to know if seeking is possible. We opted for quiet failure at some point in the past (no idea why) if a channel was not seekable but also not selectable.

I have pushed this as jruby/jruby#7415 but we'll need to evaluate how risky it is before merging and releasing.

cc @enebo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@headius

I am a bit confused which resources are not seekable but not raising an error. Just classloader resources?

Classloader resources are what I've encountered this issue with, since that's what the DB files will end up being when a JRuby application using MiniMime is bundled into a .jar.

We opted for quiet failure at some point in the past (no idea why) if a channel was not seekable but also not selectable.

The silent failure was apparently originally implemented in jruby/jruby@a4cc5ee as a workaround for jruby/jruby#3399. I don't personally think it was a good workaround, but if this behavior is changed, we should probably make sure there's no regression there. (What probably should be done, IMO, would be to fix Rack to properly handle non-seekable resources.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on my PR as well, but to keep this up-to-date...

I agree that classloader resources, accessible only as InputStream objects that don't support seeking, should raise an error when you attempt to seek on them. If we can be sure this won't affect rack, it is an easy change (and we could narrow my PR to just the resource types in question). Until we can be sure of that, however, we are potentially trading one "broken" behavior for another "broken" behavior.

The simplest fix would be to make all unseekable resources in JRuby raise an error, but there are many resource types in JRuby that would never be encountered in CRuby such as IO resources from in-memory resources that can only be read forward. It is an arguable grey area what is appropriate to do for those.

headius added a commit to headius/jruby that referenced this pull request Oct 17, 2022
This relates to the discussion in discourse/mini_mime#38 where the
proposed patch must work around JRuby's silent seek failure for
some types of resources (at least classloader resources).

The patch here will now raise EPIPE for all NIO channels that are
not seekable through some means. It is rather aggressive given the
old logic that only raised if a channel was not seekable AND not
selectable (for reasons long lost).
casperisfine pushed a commit to casperisfine/mini_mime that referenced this pull request Aug 2, 2023
When forking, file descriptors are inherited and their state shared.

In the context of MiniMime this means that the offset of the file
opened by RandomAccessDb is shared across processes, so the
`seek + read` combo is subject to inter-process race conditions.

Of course that file is lazily opened, so assuming most applications
don't query MiniMime before fork, it's not a big problem.

However when reforking post boot (e.g. https://github.com/Shopify/pitchfork)
this becomes an issue.

Additionally, even if the file descriptor isn't shared across processes,
the file position is still process global requiring a Mutex.

By using `pread` instead of `seek + read` we can both make the library
fork safe and get rid of the need to synchronize accesses.

This also happens to fix an outstanding JRuby issue.

Fix: discourse#37
Fix: discourse#38
casperisfine pushed a commit to casperisfine/mini_mime that referenced this pull request Aug 2, 2023
When forking, file descriptors are inherited and their state shared.

In the context of MiniMime this means that the offset of the file
opened by RandomAccessDb is shared across processes, so the
`seek + read` combo is subject to inter-process race conditions.

Of course that file is lazily opened, so assuming most applications
don't query MiniMime before fork, it's not a big problem.

However when reforking post boot (e.g. https://github.com/Shopify/pitchfork)
this becomes an issue.

Additionally, even if the file descriptor isn't shared across processes,
the file position is still process global requiring a Mutex.

By using `pread` instead of `seek + read` we can both make the library
fork safe and get rid of the need to synchronize accesses.

This also happens to fix an outstanding JRuby issue.

Fix: discourse#37
Fix: discourse#38
SamSaffron pushed a commit that referenced this pull request Aug 3, 2023
When forking, file descriptors are inherited and their state shared.

In the context of MiniMime this means that the offset of the file
opened by RandomAccessDb is shared across processes, so the
`seek + read` combo is subject to inter-process race conditions.

Of course that file is lazily opened, so assuming most applications
don't query MiniMime before fork, it's not a big problem.

However when reforking post boot (e.g. https://github.com/Shopify/pitchfork)
this becomes an issue.

Additionally, even if the file descriptor isn't shared across processes,
the file position is still process global requiring a Mutex.

By using `pread` instead of `seek + read` we can both make the library
fork safe and get rid of the need to synchronize accesses.

This also happens to fix an outstanding JRuby issue.

Fix: #37
Fix: #38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Seeking the DB file does not work in a bundled JRuby application, crashes randomly
5 participants